SELECTED RECENT PUBLICATIONS Managed Communication and Consistency for Fast Data-Parallel Iterative Analytics
نویسندگان
چکیده
At the core of Machine Learning (ML) analytics applied to Big Data is often an expert-suggested model, whose parameters are refined by iteratively processing a training dataset until convergence. The completion time (i.e. convergence time) and quality of the learned model not only depends on the rate at which the refinements are generated but also the quality of each refinement. While data-parallel ML applications often employ a loose consistency model when updating shared model parameters to maximize parallelism, the accumulated error may seriously impact the quality of refinements and thus delay completion time, a problem that usually gets worse with scale. Although more immediate propagation of updates reduces the accumulated error, this strategy is limited by physical network bandwidth. Additionally, the performance of the widely used stochastic gradient descent (SGD) algorithm is sensitive to initial step size, simply increasing communication without adjusting the step size value accordingly fails to achieve optimal performance.
منابع مشابه
Big Data Analytics in Bioinformatics: A Machine Learning Perspective
Bioinformatics research is characterized by voluminous and incremental datasets and complex data analytics methods. The machine learning methods used in bioinformatics are iterative and parallel. These methods can be scaled to handle big data using the distributed and parallel computing technologies. Usually big data tools perform computation in batch-mode and are not optimized for iterative pr...
متن کاملApplication of Big Data Analytics in Power Distribution Network
Smart grid enhances optimization in generation, distribution and consumption of the electricity by integrating information and communication technologies into the grid. Today, utilities are moving towards smart grid applications, most common one being deployment of smart meters in advanced metering infrastructure, and the first technical challenge they face is the huge volume of data generated ...
متن کاملFast System Matrix Calculation in CT Iterative Reconstruction
Introduction: Iterative reconstruction techniques provide better image quality and have the potential for reconstructions with lower imaging dose than classical methods in computed tomography (CT). However, the computational speed is major concern for these iterative techniques. The system matrix calculation during the forward- and back projection is one of the most time- cons...
متن کاملDistributing the Data Plane for Remote Storage Access
Sub-microsecond network and memory latencies require fast user-level access to local and remote storage. While user-level access to local storage has been demonstrated recently, it does currently not extend to serverless parallel systems in datacenter environments. We propose direct user-level access to remote storage in a distributed setting, unifying fast data access and high-performance remo...
متن کاملExploiting Bounded Staleness to Speed Up Big Data Analytics
Many modern machine learning (ML) algorithms are iterative, converging on a final solution via many iterations over the input data. This paper explores approaches to exploiting these algorithms’ convergent nature to improve performance, by allowing parallel and distributed threads to use loose consistency models for shared algorithm state. Specifically, we focus on bounded staleness, in which e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015